Search CORE

4 research outputs found

Geodesic Distance Histogram Feature for Video Segmentation

Author: A Kundu
EH Taralova
F Galasso
P Krähenbühl
T Brox
T Brox
T Leung
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/03/2017
Field of study

This paper proposes a geodesic-distance-based feature that encodes global information for improved video segmentation algorithms. The feature is a joint histogram of intensity and geodesic distances, where the geodesic distances are computed as the shortest paths between superpixels via their boundaries. We also incorporate adaptive voting weights and spatial pyramid configurations to include spatial information into the geodesic histogram feature and show that this further improves results. The feature is generic and can be used as part of various algorithms. In experiments, we test the geodesic histogram feature by incorporating it into two existing video segmentation frameworks. This leads to significantly better performance in 3D video segmentation benchmarks on two datasets

arXiv.org e-Print Archive

Crossref

Spatio-Temporal Attention Models for Grounded Video Captioning

Author: A Rohrbach
D Oneata
EH Taralova
O Russakovsky
S Hochreiter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/10/2016
Field of study

Automatic video captioning is challenging due to the complex interactions in dynamic real scenes. A comprehensive system would ultimately localize and track the objects, actions and interactions present in a video and generate a description that relies on temporal localization in order to ground the visual concepts. However, most existing automatic video captioning systems map from raw video data to high level textual description, bypassing localization and recognition, thus discarding potentially valuable information for content localization and generalization. In this work we present an automatic video captioning model that combines spatio-temporal attention and image classification by means of deep neural network structures based on long short-term memory. The resulting system is demonstrated to produce state-of-the-art results in the standard YouTube captioning benchmark while also offering the advantage of localizing the visual concepts (subjects, verbs, objects), with no grounding supervision, over space and time

arXiv.org e-Print Archive

Crossref

Lund University Publications

Improved Image Boundaries for Better Video Segmentation

Author: A Khoreva
A Vazquez-Reina
EH Taralova
F Galasso
Margret Keuper
P Isola
P Krähenbühl
T Brox
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Graph-based video segmentation methods rely on superpixels as starting point. While most previous work has focused on the construction of the graph edges and weights as well as solving the graph partitioning problem, this paper focuses on better superpixels for video segmentation. We demonstrate by a comparative analysis that superpixels extracted from boundaries perform best, and show that boundary estimation can be significantly improved via image and time domain cues. With superpixels generated from our better boundaries we observe consistent improvement for two video segmentation methods in two different datasets

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

Crossref

Archivio della ricerca- Università di Roma La Sapienza

MPG.PuRe

Towards Segmenting Consumer Stereo Videos: Benchmark, Baselines and Ensembles

Author: A Geiger
C Zach
D Oneata
D Scharstein
D Scharstein
DJ Butler
EH Taralova
J Shi
M Bleyer
P Arbelaez
P Ochs
Q Zhang
R Achanta
T Basha
T Brox
T Kanade
U Luxburg von
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Are we ready to segment consumer stereo videos? The amount of this data type is rapidly increasing and encompasses rich information of appearance, motion and depth cues. However, the segmentation of such data is still largely unexplored. First, we propose therefore a new benchmark: videos, annotations and metrics to measure progress on this emerging challenge. Second, we evaluate several state of the art segmentation methods and propose a novel ensemble method based on recent spectral theory. This combines existing image and video segmentation techniques in an efficient scheme. Finally, we propose and integrate into this model a novel regressor, learnt to optimize the stereo segmentation performance directly via a differentiable proxy. The regressor makes our segmentation ensemble adaptive to each stereo video and outperforms the segmentations of the ensemble as well as a most recent RGB-D segmentation technique

arXiv.org e-Print Archive

Crossref

CISPA – Helmholtz-Zentrum für Informationssicherheit

Archivio della ricerca- Università di Roma La Sapienza

MPG.PuRe